DOCUMENT RESUME 



ED 055 733 



RE 003 827 



AUTHOR 

TITLE 

INSTITUTION 

PUB DATE 
NOTE 

AVAILABLE FROM 



Cuba, Egon G.; Stufflebeam, Daniel L, 

Evaluation? The Process of Stimulatinge Aiding, and 
Abettiag Insightful Action, 

Indiana Univ., Bloomington, Measurement and 
Evaluation Center in Reading Education, 
elua 70 
36p, 

Reading Program, School of Education, Indiana 
University, 202 Pine Hall, Bloomington, Indiana 
47401, ($1,50) 



EDRS PRICE MF-$0,65 H<H3,29 

DESCRIPTORS Data Collection; ^Decision Making; *Evaluation 

Criteria; ^Evaluation Methods; ^Evaluation 
Techniques; Program Evaluation; ^Research 
Methodology 



ABSTRACT 

Part 1 of this monograph discusses the status of 
educational evaluation and describes several problems in carrying out 
such evaluation; (1) defining the educational setting, (2) defining 
decision types, (3) designing educational evaluation, (4) designing 
evaluation systems, and (5) defining criteria for judging evaluation. 
Part 2 proposes an approach to educational evaluation which would 
alleviate problems of definition and design. Different types of 
decision settings and appropriate evaluation strategies are explored* 
Types of decisions are matched with types of evaluation. The 
structure of evaluation design including the collection, 
organization, analysis, and reporting of information is outlined. 
Finally, criteria for judging evaluations such as validity, 
reliability, significance, scope, and efficiency are presented. 
Figures are given, including the CIPP (context, input, process, and 
product) Evaluation Model, (AL) 



o 

ERIC 



Evaluation: The Process of Stimulating, 

Abetting Insightful Action 



KN 




KN 








%r% 


Egon G. Guba 




Indiana University 


o 


Daniel L Stuffiebeam 




Ohio State University 


ui 





$, f, DEPARTMENT OF HEALTH, EDUCATION I TOMTt 
OFFiCE Of EDUCATION 



THIS OOCUSVJENT HAS BEEN REPRODUCED EXACTLY AS 8 $tlVf B fflflM 
PERSON OR ORGANIZATION ORIGINATING IT, POINTS OP VII V/ 

STATED DO NOT NECESSARILY REPRESENT OFFICIAt. OP^IQI 
POSITION OR POLICY. 



I 




b* 

W 

00 



m 

© 

o 

V 

ERIC 




Monograph Series in Reading Education 
Indiana University 
Number 1, June 1970 

Carl B. Smith 
Series Editor 





evaluation has become a key concept and an 
essential operation in education today. Without 
a comprehensive evaluation system and tech- 
niques to provide a continuous monitoring of 
all educational activity, it is unreal for educa- 
tion to speak of accountability. For that reason, 
educators need to examine the theories and the 
research that propose ways to evaluate educa- 
tional programs. This paper by Egon G. Guba 
and Daniel L, Stufflebeam was an important 
recent contribution to the discussion of how to 
develop a system for evaluating educational 
programs. 

in using this monograph during training 
seminars and conferences for evaluators of 
reading programs, the Measurement and 
Evaluation Center in Reading Education found 
the participants received considerable benefit 
from reading and discussing the concepts pre- 
sented. Many of the seminar participants sug- 
gested that the monograph be published for 
wide distribution. 

Thus, this paper was pieced in the Mono- 
graph in Reading series as a service to spe- 
cialists in reading and as a service to the wider 
audience of evaluators who will find the sug- 
gestions contained herein helpful in developing 
further theory and practice in educational 
evaluation. 



Carl B. Smith 
Indiana University 
April 1970 



\ 



Acknowledgements 

The Measurement and Evaluation Center in Reading Education wishes to thank Phi Delta Kappa 
■#*fdr granting permission to reproduce this monograph. Gratitude is liKewise due to Dr. John J. 
Horvat, Director of the Laboratory for Educational Development, ror the support provided. Mrs. 
CatherineF. Siffin supervised the editing andproductlonofthemanuscript. This bookwas 
designed by Dermot McGuinne of the. Indiana University Publications Office. 









Contents Preface 3 

introduction 6 



Parti: The Status of Educational 


Part2; A Proposal 


15 


Evaluation 


7 






The Educational Setting 


7 


Premises 


15 


The Need for improved Evaluations 


7 


Evaluation Defined 


15 


Problems In Educational Evaluation 


8 


Decision Settings and Evaluation 




The Problem of Definition 


9 


Strategies 


18 


Problems in Defining Settings to be 




Small versus Largo Change 


20 


Served by Educationa! Evaluations 


10 


High versus Low Understanding 


20 


Problems in Defining Decision Types 


10 


The Decision-Making Settings 


21 


Problems in Designing Educationa! 




Types of Decisions 


22 


Evaluations 


10 


Planning Decisions 


24 


The Problem of Designing Evaluation 




Structuring Decisions 


24 


Systems 


12 


Implementing Decisions 


25 


The Problem of Defining Criteria for 




Recycling Decisions 


26 


Judging Evaluations 


12 


Types of Evaluation 


26 


Summation 


12 


Context Evaluation 


26 


Footnotes 


14 


Input Evaluation 


27 






Process Evaluation 


27 






Product Evaluation 


28 






The Structure of Evaluation Design 


28 






Focusing the Evaluation 


28 






Collection of Information 


30 






Organization of information 


30 






Analysis of Information 


30 






Reporting of Information 


32 






Administration of Evaluation 


32 






A Total Evaluation System 


32 






Criteria for Judging Evaluations 


33 






Internal Validity 


33 






External Validity 


33 






Reliability 


33 






Objectivity 


33 






Relevance 


33 






Significance 


34 






Scope 


34 






Credibility 


34 






Timeliness 


34 






Pervasiveness 


34 






Efficiency 


34 






Footnotes 


35 



Figures 

1 The Relation of Evaluation to 

Decision-Making 16 

2 Decision-Making Settings 19 

3 Types of Decisions 23 

4 Developing Evaluation Designs 29 

5 The CIPP Evaluation Model 31 




This paper was delivered at the Second 
National Symposium for Professors of Edu- 
cational Research, sponsored by Phi Delta 
Kappa at Boulder, Colorado, November 21, 

1968. The authors wire asked to summarize, 
synthesize, and update some of their past 
writings about educational evaluation. 

A great deal of confusion and controversy 
regarding the relationship of evaluation 
methodology to research methodology exists. 
The authors of this paper have taken a rather 
specific position in this controversy, rejecting 
the proposition that evaluation is equivalent to 
research, that is, that the same assumptions 
and methodologies hold for the two fields. The 
writers assert that professors of educational 
research are largely to blame for the confusion 
and ineptness which persist in the field of 
evaluation. The authors think that many re- 
searchers make wrong assumptions about what 
an evaluation study should accomplish, and 
that, based on these erroneous assumptions, 
researchers foist bad advice upon unsuspecting 
and unsophisticated practitioners, As a conse- 
quence, evaluations are usually useless, and 
practitioners are largely justified in the jaun- 
diced view they typically have taken about 
evaluation and its utility. 

The authors attempt to validate these asser- 
tions and to suggest some alternative ways of 
viewing evaluation. Their aim is to stimulate in- 
quirers and developers to help produce a new 
methodology which will permit more effective 
evaluation practice. At the very least, the au- 
thors hope to expose some of the more salient 
issues concerning evaluation and stimulate dis- 
cussion of them. 

The views presented here are drawn from a 
number of sources, including several of the 
authors’ own papers, 1 the writings of other 
experts in the field, and especially intensive 
discussions with a number of colleagues. 2 This 
paper is divided into two parts. In Part 1 the 
authors attempt to describe the field of edu- 
cational evaluation as It exists and to delineate 
six major problems which must be overcome if 
evaluation as a science is to be soundly ad- 
vanced. In Part 2 the authors attempt to re- 
spond to these problems. Essentially this re- 
sponse takes the form of a proposed new defi- 
nition of evaluation and the means through 
which this definition may be explicated and 



operationalized. Overall, the paper attempts to 
point out directions which other re search 
methodologists can follow in advancing the 
theory and practice of educational evaluation. 

The authors wish to emphasize the tentative 
nature of these formulations, which are still in 
an early developmental state and are them- 
selves largely unevaluated. 

Egon G. Cuba 
Daniel L. Stufflebeam 



THE EDUCATIONAL SETTING 



Education is highly valued as a means for 
meeting the social, economic, technological, 
and scientific needs of society as well as the 
intellectual needs of citizens. To fulfill this 
complex role educators must deal with a wide 
range of urgent problems, such as the inequal- 
ity of opportunity afforded to members of 
minority groups, riots in the cities, disillusion- 
ment of youth, and school dropouts. Education 
thus has a most difficult charge which requires 
the initiation of many innovative programs. 

To facilitate such educational innovation, 
society is annually providing billions of dollars 
through federal, state, and foundation pro- 
grams to education agencies at all levels. 
Examples of increased support to educa- 
tion include the Elementary and Secondary 
Education Act of 1965, the Headstart Program, 
the Follow-Through Program, The Educational 
Professions Development Act, and the Experi- 
enced Teacher Fellowship Program, Many in- 
dustries and non-profit research enterprises are 
also developing education components, and 
education-industry combines have become 
common place. Clearly, in addition to new re- 
sponsibilities, education also has unprece- 
dented opportunities to improve and expand its 
programs. 

Along with the opportunities for change 
goes a responsibility to evaluate new plans and 
programs. Evaluation requirements are es- 
pecially evident in federal assistance programs. 
Here the law explicitly states that fund recipi- 
ents will make at least annual evaluation re- 
ports. 

Such requirements for evaluation are rea- 
sonable and long overdue. Funding agencies and 
the public have a right to know whether their 
huge expenditures for education are producing 
the desired effects. And, educators themselves 
need evaluative information to be sure the 
changes they induce are in fact Improvements. 

To justify requirements for evaluation is not 
equivalent to operationalizing them, however. 
Educators must respond by establishing evalua- 
tion units, defining the roles of staff needed to 
operate these units, and recruiting and traini 
personnel to fill those roles, They must deter- 
mine the evaluative questions to be 'answered, 
select or construct appropriate instruments, 
and select samples of the persons who are to 
respond to the instruments. They must provide 



means for organizing, analyzing, and reporting 
evaluative information; and they must define 
the evaluation schedule, policies, and budget. 
Last but not least, evaluators must develop 
working relationships with those who will pro- 
vide information for the evaluation as well as 
those who will receive and utilize the informa- 
tion. Clearly, the task of evaluating any educa- 
tional program is highly complex. 

THE NEED FOR IMPROVED EVALUATIONS 

How have educators responded t,.> their new 
evaluation responsibilities? To what extent 
have they responded at all? And how good have 
their evaluation studies been? 

Without question, educators have made a 
massive response to requirements for evalu- 
ation. The multitude of evaluation reports now 
available from local schools, state education 
departments, regional educational laboratories, 
educational industries, and the like, is a drama- 
tic indication of the significant expenditures of 
time, effort, and money for the evaluation of 
educational programs. However, the increased 
activity alone has not, met the need for effec- 
tive evaluations. While educators have been 
busy doing evaluations, these evaluations have 
not provided the information needed to support 
decision-making related to the programs being 
evaluated. 

Many of the completed evaluation reports 
contain only impressionistic information. 
Though such information may be pertinent to 
the concerns of decision-makers, it usually has 
lacked the level of credibility required by de- 
cision-makers to defend their decisions, and 
seldom has such information been of material 
use in arriving at important decisions. A case 
in point is the first annual report for Title I of 
The Elementary and Secondary Education Act. 3 
This report was highly important as it encom- 
passed thousands of Title I projects throughout 
the nation. However, it fell far short of being 
a useful document, for it was almost devoid of 
hard data. On the other hand, it did contain 
many anecdotal accounts in which persons who 
were responsible for conducting Title I activi- 
ties stated that they felt that their programs 
had been successful. Many of them speculated 
as to the reasons for the alleged successes. 
Though these anecdotes may have touched key 
issues related to Improving the billion dollar 
per year Title ! program, decision-makers in 



the Congress, the Office of Education, state 
education departments, and local school dis- 
tricts could hardly base important decisions on 
a few “possibly accurate" pieces of testimony. 

The situation is not much different in Title 
III of the Elementary and Secondary Education 
Act. Title III staff members in the U.S, Office of 
Education have continuously ranked the quality 
of Title III projects on a five point scaie for 
each of fifteen criteria. 4 The criterion relating 
to evaluation has consistently been ranked near 
the "poor" end of the scale and lower than 
thirteen of the other criteria— the exception 
being the criterion related to dissemination. One 
of the authors of this paper made an analysis 
of thirty-two Title III projects, and concluded 
that “It is very dubious whether the results of 
these evaluations will be of much use to any- 
one. They are likely to fit well, however, into 
the conventional school man's stereotype of 
what evaluation is: something required from on 
high that takes time and pain to produce but 
which has very little significance for action." 5 

Unlike the Title I and Title III evaluations 
referred to before, some evaluations provide 
for hard data. For example, the evaluation re- 
port for New York City's Higher Horizons Pro- 
gram 1 used rigorous research procedures to 
compare the performance of an experimental 
group receiving the Higher Horizons Program 
with the performance of a control group 
matched to the experimental group on several 
counts. The basic conclusions contained in this 
nearly 300 page report were typical of findings 
for rigorous educational evaluations: "There 
were no significant differences." In sharp con- 
trast, however, the report also noted that the 
teachers and principals who had been involved 
in the program said that it was making dif- 
ferences so significant that the program simply 
could not be abandoned. 

Though the Title I, Title III, and Higher 
Horizons evaluating differed as to rigor, they 
were alike in one respect; None of them pro- 
vided much help to decision-makers for im- 
proving the programs being evaluated. While 
only three examples of the deficiencies in cur- 
rent evaluations have been cited, they are suf- 
ficiently important ones to illustrate the point. 

In too many cases, evaluation reports provide 
little or no help to decision-makers, and deci- 
sion-making in and about education must re- 
main an arty endeavor. 



PROBLEMS IN EDUCATIONAL EVALUATION 

What is the explanation for this situation? Why 
is it that educators are failing to provide evalu- 
ations which are at the same time useful and 
scientifically respectable? Why is it that evalu- 
ations which adhere to classical research 
methods provide information which is of only 
limited help in making decisions about pro- 
grams, and why do the typical “no significant 
difference" findings in so many of these evalu- 
ations contravene the experiences of those who 
are intimately Involved in the programs? 

One cannot answer these questions simply 
on the grounds that evaluation practice lags 
too far behind evaluation theory, or that there 
is a lack of effort on the part of educators to 
evaluate their programs. Further, it is not 
enough to note that evaluation testimony given 
by witnesses is not credible, or that typical 
finding, A "no significant differences" are cor- 
rect because nothing in education ever makes a 
difference. Rather, the lack of adequate evalua- 
tion Information probably persists because of 
several fundamental Impediments which must 
be removed before educators can improve their 
evaluations. These impediments include the 
lack of trained evaluators and training pro- 
grams, the lack of appropriate evaluation in- 
struments and procedures, and the lack of ade- 
quate evaluation theory. This latter lack is, the 
authors believe, crucial. 

Clearly, the conceptual bases for evaluations 
are of fundamental importance. If these con- 
ceptions are faulty, then the evaluations which 
are based on them must also be faulty. Thus, 
it is highly important to identify and examine 
the efficacy of conceptualizations which under- 
lie current needs for evaluation as well as edu- 
cators' attempts to meet these needs. It will be 
useful to divide these conceptualizations into 
six classes and to consider each one separately. 
The six classes are: 

1. The definition of educational evaluation. 

2. The nature of the educational settings with- 
in which evaluations must be conducted. 

3. The definition of information requirements 
for educational evaluation, 

4. The structure of evaluation designs. 

5. The structure of evaluation systems. 

6. The definition of criteria for judging evalu- 
ations, 






The Problem of Definition 

Evaluation, like any analytic term, cnn be de- 
fined in many essentially arbitrary ways. The 
question is not so much that of the "right" way 
to define evaluation, but how we can recognize 
the contribution that different definitions can 
make to our thinking and how to devise a par- 
ticular definition that suits the purposes and 
needs in mind. 

in its earlier days educational evaluation 
was largely equated with measurement. This is 
not surprising in view of the history that pro- 
ceeded it. Evaluation really came into its own 
during the twenties following upon the heels 
of the very successful measurement movement. 
Abraham Kaplan is fond of what he calls the 
"Law of the Instrument," viz., 

Give a small bey a hammer, and he will 

find that everything he encounters needs 

pounding . 7 

Thus it was natural that following the 
successful invention and adoption of standard- 
ized tests it was found that everything needed 
to be tested. The accumulation of scores and 
the statistical manipulation of those scores to 
produce that pseudo-standard called norms 
made it possible to form many judgments; this 
process came to be called evaluation. 

This definition had the advantage of stress- 
ing the importance of reliability, validity, and 
objectivity in collecting and interpreting data; 
it had great disadvantages in that it ignored 
the judgmental aspect of evaluation and that It 
tended to eliminate as unimportant, variables 
for which instruments were not readily avail- 
able. 

Another definition which received wide cur- 
rency, and is still the backbone of most evalu- 
ative thinking today, is that formulation which 
regards evaluation as a process for deter- 
mining the congruence of performance and ob- 
jectives. All school programs should be guided 
by behavioral objectives; indeed, it is the es- 
sence of program planning to project objectives 
and the essence of curricuiar planning to pro- 
ject a series of experiences through which the 
pupil can achieve the objectives. Similarly it is 
the essence of evaluation to determine whether 
the objectives were in fact met. 

This definition has certain advantages and 
disadvantages. Clearly, it is possible in these 
terms to focus not on ly on the student but also 



on the program. If objectives are not met, it is 
not a foregone conclusion that it is the student 
who is out of step. Thus feedback is encour- 
aged leading perhaps to diagnosis and re- 
mediation of students, but just as possibly, 
leading to curricular change and refinement 

But at the same time evaluation is pointed 
by this definition toward outcomes; one cannot 
evaluate until performance has taken place and 
can therefore be compared to objectives. Thus 
while the definition implies feedback leading 
to refinement, such feedback typically cannot 
occur until the termination of the treatment 
being evaluated. The promise of the definition 
is thus often not attained. 

Moreover, the matter of criteria remains un- 
resolved, While standards are perhaps implicit 
in the statement of objectives, the source of the 
objectives is mystical, It Is often asserted that 
they are “screened" through u philosophy and 
a psychology but which philosophy and which 
psychology is unspecified, as is the meaning of 
the term "screen." Finally, this definition of 
evaluation places an overwhelming importance 
on behavioral objectives, encouraging the 
belief that only "ultimate" evaluations which 
trace everything back to improved student per- 
formance are worthy of the name. Today there 
is insistence on evaluating even national pro- 
grams like ESEA Title III this way, forgetting 
that there are other standards that might be 
applied with greater validity. Thus it would not 
occur to us to evaluate a carburetor by looking 
for changes in driver behavior, but in education 
an analogous process seems to make sense, 

A third definition of evaluation tends to 
equate evaluation with the judgmental pro- 
cess. If the equation of evaluation with mea- 
surement can be scored for ignoring the value 
dimension of evaluation, then surely the equa- 
tion of evaluation with judgment can be scored 
for ignoring the processes of arriving at Infor- 
mation. Yet this procedure is fairly common, as, 
for example, in the evaluation processes of 
accrediting associations such as the North Cen- 
tre) Association or the American Association of 
Colleges for Teacher Education, where the 
judgment rendered by a visitation team It the 
evaluation, or in the panel review processas 
utilized by many funding agencies, including 
the U.S. Office of Education, for evaluating 
proposals. While this method has the advan- 
tages of quick responso and the utilization of 
the full range of the evaluator's competence, 



it obviously leaves much to be desired In terms 
of objectivity and validity, which are at best 
moot. 

None of these three definitions is thus en- 
tirely satisfactory. Each has certain advantages 
which should be retained, but each also has 
certain disadvantages which are at best annoy- 
ing and at worst devastating. Clearly a more 
defensible formulation is required. 

Problems in Defining Settings to be Served by 
Educational Evaluations 

Let us examine the problems involved in pro- 
viding an adequate focus for educational evalu- 
ation studies. Obviously, to evaluate one must 
know something about the program within 
which the evaluation is to be conducted. Gain- 
ing such knowledge, however, is a difficult task 
at best. Currant needs for educational evalua- 
tion have arisen in relation to programs and 
activities which are new to the field of educa- 
tion. Such activities involve responsibilities 
newly assigned to educators, new kinds of rela- 
tionships among different kinds and levels of 
agencies, and a need for cooperative decision- 
making about education among a variety of 
educational and non-educatlona! agencies. It 
should come as no shock if the evaluation the- 
ory which has traditionally been viewed as 
appropriate for education is found no longer to 
be adequate to meet the information require- 
ments in new educational settings. Clearly, 
many of the new programs in education are 
dramatically different from those of the past, 
and the new programs themselves differ great- 
ly from each other. Probably different evalua- 
tion strategies wiil be needed for different edu- 
cational settings. Before these evaluation strat- 
egies can be developed, however, the different 
kinds of educational settings within which 
evaluations are to be conducted must be con- 
ceptualized. 

Problems in Defining Decision Types 

Even if adequate conceptualizations of the dif- 
ferent educational settings to be served by edu- 
cational evaluation existed, there is insufficient 
knowledge of the information requirements to 
be met by educational evaluation. What types 
of questions must be answered by evaluation 
studies, and how can they be classified so as to 
facilitate the development of a generalizable 



set of evaluation designs? Programs to im- 
prove education depend heavily upon a variety 
of decisions, and a variety of information is 
needed to make and support those decisions. 

Evaluators charged with providing this in- 
formation must have adequate knowledge 
about the relevant decision processes and as- 
sociated information requirements before they 
can design adequate evaluations. They need to 
have knowledge about the place, focus, timing, 
and criticality of decisions to be served, At 
present no adequate formulation of decision 
processes and associated information require- 
ments relative to educational programs exists. 
Nor is there any ongoing program to provide 
this knowledge. In short, there are no adequate 
conceptualizations of decisions and associated 
information requirements or programs to pro- 
duce them. 

Problems In Designing Educational Evaluations 

If current conceptions of evaluation are not 
adequate for evaluating current educational ac- 
tivities, neither can extant designs be adequate, 
Recall the kinds of designs educators use to 
evaluate their programs. If a design is used at 
all, It typically is an experimental design. The 
fundamental concern of experimental design is 
that data which are produced be internally 
valid, i,e„ unequivocal. Several conditions are 
necessary to meet this criterion. The units to 
be measured should be randomly assigned to 
treatment and control conditions, For example, 
a set of students might be partitioned randomly 
into two groups— one to receive a new program, 
the other to receive the school's present offer- 
ing in the area to be served by the new pro- 
gram. Next, the treatment and control con- 
ditions must be applied and held constant 
throughout the period of the experiment, i.e„ 
they must conform to the initial definitions of 
these conditions. The new or traditional pro- 
gram conditions could not be modified in pro- 
cess, because in that event one could not tell 
what was being evaluated. Also, ail students in 
the experiment must receive the same amount 
of the treatment to which they are assigned* 
and care must be taken so that students receiv- 
ing one treatment are not contaminated by the 
other treatment, if contamination occurred, one 
could not tell later what had caused what. 
Therefore, until an experiment is completed, 
one must resist the temptation to apply the 



successful activities of one condition to stu- 
dents receiving a different condition, even if 
the activities in the latter condition are obvi* 
ously failing. Finally, an instrument which is 
valid and reliable for the specified criterion 
variable must be administered after a certain 
period of time— usually a complete program ey- 
cle— to subjects from both parts of the experi- 
ment. Then, if all of the above conditions were 
met, one could use predetermined statistical 
procedures and decision rules to determine un- 
equivocally that there were or were not signi- 
ficant differences between the experimental 
and control groups on the outcome variable of 
Interest. 

On the surface, the application of experi- 
mental design to evaluation problems seems 
reasonable, as traditionally both experimental 
research and evaluation have been used to test 
hypotheses about the effects of treatments, 
However, there are four distinct problems with 
this reasoning, 

First, the application of experimental design 
to evaluation problems conflicts with the prin- 
ciple that evaluation should facilitate the con- 
tinual improvement of a program. Experi- 
mental design prevents rather than promotes 
changes in the treatment because treatments 
cannot be altered in process if the data about 
differences between treatments are to be un- 
equivocal. Thus, the treatment must accommo- 
date the evaluation design rather than vice 
versa, and the experimental design type of 
evaluation prevents rather than promotes 
changes in the treatment, It is probably unreal- 
istic to expect directors of innovative projects 
to accept conditions necessary for applying ex- 
perimental design. Obviously, they can’t con- 
strain t K eir treatment to its original definition 
just to ensure internally valid end-of-year 
evaluative data, Rather, project directors must 
use whatever evidence they can obtain to refine 
continually and sometimes to change radically 
both the design and its implementation. It is 
thus contended here that conceptions of evalu- 
ation are needed which would stimulate rather 
than stifle dynamic development of programs. 

A second flaw In the experimental design 
type of evaluation is that it is useful for making 
decisions after a project has run full cycle but 
almost useless as a device for making decisions 
during the planning and implementation of a 
project. The potential confounding the vari- 
ables must either be controlled or eliminated 



through randomization if the study results are 
to have internal validity. However, in the typi- 
cal educational setting this is nearly impossible 
to achieve. For example, consider the following 
quotation from an evaluation report completed 
by Julian Stanley: 

Even if the program does have consider- 
able cumulative influence on a person's 
career, this may be slow in appearing and so 
interactive with other influences that it can- 
not be discerned clearly by the person him- 
self or by others. 

Nevertheless, we must use whatever evi- 
dence that can be adduced to determine 
whether or not such programs are worth re- 
peating and, if so, how they should be modi- 
fied in order to be more effective. Ideally, in 
the experimental design sense, we should 
conduct the program as a controlled experi- 
ment, with a well-matched control group that 
does not attend the institute, and follow up 
both groups for quite a few years in order to 
determine how they diverge, if recruiting be- 
gins early enough and the applicant group is 
able enough to provide both groups at a suf- 
ficiently high level, this might be done, 
though the "reactivity" of the disheaitened 
rejectees, the self-fulfilling prophecy of the 
rejectees, and the inability to control the 
summer activities of the rejectees might 
undesirably affect the outcome of the experi- 
ment. Merely having on one's record the fact 
of attending a certain prestigious program, 
like displaying one's Phi Beta Kappa key, 
might be a powerful aid. . , . Qur chief way 
of evaluating the success of the program is 
via reports from staff and participants, parti- 
cularly the latter.* 

In the above quotation, Professor Stanley has 
pointed to many of the reasons why experi- 
mental design does not seem well suited to 
evaluation problems in education, in many 
innovative programs there ciearly are a multi- 
tude of confounding factors which simply can- 
not effectively be controlled. 

But the difficulty pointed to here is more 
complex than one would infer from Stanley's 
statement, it is not just a matter of being un- 
able, in the real world, to satisfy ail the re- 
quirements posed by experimental design; it Is 
also a matter of being unwilling to do so. 
Evaluation is not interested only in determining 



the relationship among variables in that bast of 
all possible worlds— the laboratory; it is also 
concerned with determining what will happen 
in the worst of all possible worlds. Thus, far 
from wishing to screen out possible sources of 
interference, evaluation is actually concerned 
with inviting Interference so that results under 
the worst possible circumstances can also be 
assessed. 

A fourth problem inherent in the appli- 
cation of conventional experimental design is 
the possibility that while internal validity may 
be gained through the control of extraneous 
variables, such an achievement is accom- 
plished at the expense of externai validity. If the 
extraneous variables art tightly controlled, one 
can have much confidence In the findings per- 
taining to how an innovation operates in a con- 
trolled environment. However, such findings 
may not be gene ralizabi© to the real world at 
al! because in that world the so-called extrane- 
ous variables operate freely. Clearly, it is impor- 
tant to know how educational innovations oper- 
ate under real world conditions. 

The Problem of Designing Evaluation Systems 

A fifth problem is that of providing institu- 
tional settings in which evaluation can occur as 
a matter of course. To meet the evaluative 
needs of educators it is necessary to provide 
both for continuous, systematic evaluation 
needs and for unpredictable, ad hoc Infor- 
mation needs which emerge in programs of 
change. Certain routine and predictable infor- 
mation requirements should be provided for 
systematically just as attendance is taken and 
achievement data is collected on a regular 
basis. To handle such Information needs de 
novo each time they occur certainly is ineffi- 
cient. On the other hand an effective evalua- 
tion mechanism should possess the capacity of 
performing ad hoc studies when they are 
needed. To meet both of these conditions re- 
quires much more knowledge than is presently 
available about the role of evaluation mecha- 
nisms within educational programs. Where 
should such a unit be housed organizationally? 
What support is necessary for such a unit? What 
data should be.collected routinely? What evalu- 
ative services should be performed for other role 
functionaries within the educational program? 
What policies and guidelines should govern the 
operations of the unit? These and many other 



related questions should be answered if educa- 
tional agencies are to install and maintain the 
types of evaluation units they need. 

The Problem of Defining Criteria for Judging 
Evaluations 

Finally, attention must be given to the matter 
of defining criteria forjudging the worth of 
evaluations. If inappropriate or insufficient cri- 
teria are applied for this purpose, serious 
trouble will result The result may well be 
faulty designs and useless reports, if, for exam- 
ple, an evaluation ceslgn is selected solely up- 
on the basis of reliability and validity, valid 
and reliable information might be produced at 
a time when it is too late to be of any use in an 
action program, Consider the following excerpt 
from testimony pertaining to Title I evaluations 
given before a Congressional committee by a 
citizens' group in New York City; 

We ask for amendments to render the re- 
quired evaluations of Title I projects meaning- 
ful. The Act states that evaluations must be 
made, not that they be utilized in future 
planning. In New York City this year, projects 
were recycled before last year’s evaluations 
were submitted, To be made more useful, 
evaluations should have built into them al- 
ternatives and the recommendations of the 
evaluator. What is now an expensive exercise 
should be made a function to provide service 
to local school boards having the responsi- 
bility for making policy based on experience. 
American business would not survive if its 
consultants did not supply management with 
alternatives after reviewing the efficacy of 
of programs. 9 



Here, the major concern seems to be that re- 
ports yielded by current evaluation programs 
are neither sufficiently specific nor timely to 
influence educational programs. Obviously, 
evaluations which do not at least meet these 
two criteria are of little use. 

SUMMATION 

This then concludes a review of the current 
needs and problems in educational evaluation. 
The authors have noted that completed evalua- 
tions have been far from adequate and have 




assarted that the fundamental problem is a lack 
of adequate conceptualizations regarding the 
nature of educational evaluation In the context 
of the emergent programs of educational 
change. In this regard, six theoretical problems 
were discussed which the authors believe must 
be solved before meaningful evaluation 
methodology can be developed, These prob- 
lems were: 

1, Inadequacies of present definitions of edu- 
cational evaluation, 

2, A lack of understanding of the differs, ‘ 
educational settings within which evaluation 
must be conducted, 

3, A lack of understanding of generaiizabie in- 
formation requirements which educational 
evaluation studies must meet. 

4, The lack of a valid structure for the generai- 
izabie parts of evaluation design, 

5, The lack of concepts needed to organize and 
operate evaluation systems, 

6, The lack of an appropriate set of criteria for 
judging the worth of evaluation strategies, 
designs, Instruments, reports, etc. 

In the next part of the paper a response is 
made to each of these problems by proposing 
some new conceptualizations to undergird the 
ivaluatlons which the authors believe are 
needed In programs of educational change. 





Footnotes 'See especially Daniel L. Stufflebeam, 

"Evaluation as Enlightenment tor Decision- 
Making," an address delivered at the Working , 
Conference on Assessment Theory, Sponsored 
by ASCD Sarasota, Florida, January 1968; and 
Egon Q, Cuba, "Evaluation and Chang# in Edu* 
cation," a paper prepared for the Elk Grove 
Training and Development Center, Indiana 
University, NISEC, May 1968. 

’The authors’ Ideas have profited from the 
interchange afforded by the Phi Delta Kappa 
National Study Commission on Evaluation, 
whose membership consists, in addition to the 
present authors, of Walter J. rolsy, William J. 
Gephart, Robert L, Hammond, Howard 0. 
Merriman, and Malcolm Proves, The writers 
have also benefitted greatly from discussions 
with John Horvat, David L. Clark, and Sidney 
Eboch, 



“Public Law 89*10: The Elementary and Sec- 
ondary Education Act of 1966, Title !, 

4 These criteria are listed on pp. 70*71 of the 
current Title III guidelines. (A Manual for Pro- 
jects Applicants and Grantees, Washington, 

D C,: U.S. Office of Education, 1967). 

5 Egon G. Guba. Evaluation and the Process 
of Change, Notes and Working Papers Con- 
cerning the Administration of Programs author- 
ized under Title III of Public Law 89*10, The 
Elementary and Secondary Education Act of 
1965 as amended by Public Law 89-750, April 
1967, p. 312. ; v 





“Wayne J, Wrightstone, et a\. Evaluation of 
the Higher Horizons Program for Underprivi- 
leged Children, Cooperative Research project 
No. 1124, Bureau of Educational Research, 
Board of Education of the City of Nev ork. 

. V - >,,-i-Xs.r.rAbraham Kaplan, The Conduct of inquiry, 

l iSan Francisco: Chandler Publishing Company, 
1964, p, 28 . .y 1 : " : • 

-j . “Julian Stanley. Benefits of Research De- 

U! - sign; A Pilot Study, Final Report, Project No. 

i ■ -4 • X-005, Grant OES-10-272, U.S. Department of 

r ’ Health, Education and Welfare, Off ice of Edu- 
•i-’S.^^^eatidri, Bureau 1 of Research, August 1966. 






^Citizens' Committee for Children of New . 
York, Inc. Newsletter, Statement of Mrs. 
i ‘ Nathan W; Levin. Chairman of the Educational 
Services Section, before the Sub-Committee on 
: the Elementary and Secondary Education Act 
of the Education and Labor Committee of the 
. House of Representatives, March 18, 1967. 



*'ri- i 






To respond to the problems identified in Part 
1, this second part of the paper is divided into 
eight major sections, In the first section the 
premises are presented upon which subsequent 
conceptualizations are based, In the second 
section a general definition of educational 
evaluation is proposed. Section three contains 
conceptualizations of different educational set- 
tings and of corresponding evaluation strate- 
gies required to deal with them, Section four 
presents conceptualizations of four types of 
educational decisions, and section five pro- 
poses four different evaluation designs appro- 
priate to them. Section six is an effort to out- 
line the structure of evaluation design. In sec- 
tion seven the authors attempt to synthesize 
their conceptualizations of evaluation strategies, 
types, and design steps into a single model for 
an evaluation system which can meet con- 
tinuous information requirements of an ongoing 
educational program while still retaining re- 
sponsiveness to emergent, idiosyncratic evalu- 
ation needs. Finally, in section eight the writers 
suggest and define criteria which can be em- 
ployed to judge the worth of developed evalu- 
ation systems, study designs, and reports. 



PREMISES 

The general logic of the proposed model is 
shown in Figure 1. Program operations or ac- 
tivities are evaluated to influence decisions 
which influence program operations which are 
in turn evaluated, ad infinitum. Figure 1 also 
indicates that the evaluation process includes 
five steps: (1) focusing the evaluation to iden- 
tify the questions to be answered and the cri- 
teria to be employed in answering them, (2) 
collecting information, (3) organizing informa- 
tion, (4) analyzing information, and(5)report- 
ing information. 

-Implicit inlhis logic are five premises which 
form the basis for the proposed evaluation 

model:- ' - •= - - 



1, -The purpose of evaluation is to provide in- 
formation for deoision-making; to evaluate, 
therefore, it Js necessary to know what 
decisions are to be served. 



2. Different evaluation strategies are re- 
quired depending upon the nature of differ* 

ent decision-making settings to be served; 
thus, a valid model for educational evalua- 
. tion shouldbe grounded in sound concep- 
tualizations of the different decision-making 



settings to be served. 

i i-V". • ••* 

vtK •; = . • it -;. . — = _v - . 









,:fW • 






3. Within any decision setting, different types 
of decisions require different types of evalu- 
ation- designs; therefore, a general izable and 
efficient evaluation mods! should be based 
upon a parsimonious conceptualization of 
the types of decisions and evaluation de- 
signs which are generalizablo to ail educa- 
tional decision settings. 

4. While the content of different evaluation de- 
signs varies, a single set of gent ralizable 
steps can ba followed in the design of any 
sound evaluation. 

5. Because evaluation studies should answer 
questions posed by decision-makers, designs 
for such studies should satisfy criteria both 
of scientific adequacy and of practical utility. 



EVALUATION DEFINED 

Given these premises, evaluation is defined as 
follows: 

Evaluation Is the process of obtaining and 
providing useful information for making edu- 
cational decisions, 

This statement contains six key terms, Each 
of these terms will be defined at this point as 
each one has significant implications for the 
processes and techniques of evaluation. 
Process is defined as a particular and contin- 
uing activity subsuming many methods and in- 
volving a number of steps or operations. 

Particular attention should be paid to the 
fact that the evaluation process is conceived 
as continuing: in particular, it is not con- 
ceived as terminal or is having a discrete 
beginning and ending. It will be seen that 
evaluation activities tend to bo (a) sequen- 
tial. i.e,, with each activity forming a logical 
base for the next, and (b) iterative, i.e., re- 
current or cyclical. Evaluation is aiss con- 
ceived as multifaceted, involving many dif- 
ferent methods or techniques. This dynamic, 
complex conception of evaluation is in sharp 
contrast to the relatively static, terminal, sin- 
gle-phase conception of evaluation that is 
.current.. O:'- ^ ^ 

Obtaining is defined as making available 
through such processes as collecting, or- 
ganizing, analyzing, and reporting, and 
through such formal means as measurement, 

‘ data processing, and statistics. 









I 





Information! 
. Analysis 

' Lw : -’ • v .' J V -V "V V ■!-. xi'i - -V7 J -V-; . ; ,C: v. ''c; 

Figure 1: The Relation of Evaluation to Decision-Making 



Information 

Collection 



1 



l 

Si/ 



- EVALUATION^) 

A 




Information '! 
Organization 




Providing is defined as fitting together 
into systems or subsystems that best serve 
the needs or purposes of the evaluation The 
evaluator as provider is concerned primarily 
(but not exclusively) with meeting the ad hoc 
criteria posed by his client, whatever those 
may be, e,g„ cost, staff availability, political 
viability, and the like. To provide implies 
familiarity with conventional techniq jes of 
information reporting and transmission, as 
well as a concern for developing new 
methods of client criterion identification and 
the adaptation of information thereto. The 
evaluator who acts as provider functions pri- 
marily as an interactor or interface, a fact 
which is perhaps the chief basis for the evalu- 
ator’s claim to a professional role, in the 
sense of a privileged relation to a client. It is 
his function to help the client identify his 
needs, to formulate his criteria, and then to 
order and highlight th® evaluative data into 
reports that best serve those needs in the 
framework of the evolved criteria. 

Useful is defined as appropriate to estab- 
lished criteria agreed upon by the evaluator 



The term decision is the key term in the 
entire definition of evaluation. As will be seen, 
the derivation of the decision situations to be 
served by an evaluation serves as the touch- 
stone for the design of all evaluation steps 
and as the ultimate criterion for inclusion or 
exclusion of any information or technique 
which might be proposed. The evaluator as 
provider is concerned with defining, together 
with the client, the necessary decision ques- 
tions and the alternatives which exist within 
each decision situation, The evaluator as ob- 
tainer is concerned with the collection, organi- 
zation, analysis, and reporting of information 
that illuminates each alternative, weighing 
each one in terms of its utility as a decision al- 
ternative, applying information to the selection 
of an alternative, testing the selected alterna- 
tive for utility, and suggesting ways in which 
the alternative might be improved or further 
refined or abandoned in favor of some other 
alternative. 

So much for formal definitions. The reader 
may be tempted to suggest that the proposed 
definition of evaluation based on the decjsion- 



and the user or client. In determining utility 
the evaluator leans heavily upon certain 
practical criteria such as timeliness and rele- 
vance. Evaluation designs must be shaped by 
considerations of utility rather than being 
simply helter-skelter collections of easily avail- 
able or easily measured variables. 

Information is defined as descriptive or 
interpretive data about entities (tangible or _ . 
intangible)* and their relationships, Webster 
defines Information as "knowledge acquired 
in any manner; facts; data; learning; lore." 10 
This definition is useful in reminding readers 
that evaluation Is concerned not only with . 
scientific findings but also with information 
resulting from precedent or experience. The 
definition also serves to remind that informa- 
tion can be derived in a variety of ways. 

Further, the phenomenology which the in- 
formation purports to describe or relate need 
not always be measureable in the rigorous 
sense; so-called intangibles are also eligible 
for inclusion when required, if conventional 
methods of obtaining information do not per- 
mit measurement of intangibles, It is time to 
extend the methodology rather than to / 
exclude the difficult variables. 

Educational Decisions are defined as Vv 
choices among alternatives. • ; 



making process is so different from current for- 
mulations that it ought to be given a different 
name than evaluation. He may well feel that 
putting a new label on this process will prevent 
many misunderstandings. After all, everyone 
now conjures up certain mental images, and per- 
haps certain altitudinal responses, when the 
term evaluation Is mentioned; would it not be 
wise to avoid all the misinterpretations and 
false imputations that might result from con- 
fusion in the reader’s mind between what is 
meant here and what h# has always understood 
by the term? 

No doubt some clarification might occur 
through this device, but a great many confu- 
sions much more detrimentaiin the long run 
would probably result First of all, if the pro- 
cess is not called eval uation, it will not be 
associated in the reader's mind with the re- 
quirements for evaluation that are being posed 
all around him. He will not understand that 
these formulations are responsive, for example, 
to th# requirement that ho evolve a mecha- 
nism to evaluate his Title III project. Thus the 
reader may mistake the arena in which these 
formulations have application. 

Second, if some other name were to be 
used, the reader might come to believe that the 
process discussed here is bigger than evalua- 
tion; evaluation might be seen as only a part of 




a more complex process designed to aid decis- 
ion-makers. Readers will come to see that 
evaluation, as that term is ordinarily used, 
looks very much like what is called product 
evaluation later in this paper. Product evalu- 
ation serves a particular kind of decision need 
(recycling decisions), to be sure, but there are 
other kinds of decisions as well. It would be 
unfortunate indeed if evaluation were allowed 
to carry only this restricted meaning. 

Third, and perhaps most important, the pro- 
cess described as evaluation here comes much 
closer to the root meaning of the term, to 
evaluate, than does the process which currently 
masquerades under the name; one might argue 
that if a name were to be changed it ought to 
be that of present practice. Values come most 
meaningfully into play when there are choices 
to be made, and the making of choices is the 
essential act of decision-making. What is pro- 
posed here is that the entire act of evaluation 
center on the criteria to be invoked in making 
decisions, it is through exposing such criteria 
that guidance is obtained about the kinds of 
information which should be collected, how 
such information should be analyzed, and how 
it should be reported. The term evaluation 
seems to be particularly suited to the process 
as described hers, as that process makes such 
distinctive use of value concepts. 

Finally, it may be asserted that the pro- 
posed definition is not as foreign to contem- 
porary thought as it might at first appear. Sim- 
ilar definitions have been proposed elsewhere. 
Cronbach, for example, offers this formulation: 

To draw attention to its full range of func- 
tions. wo may define evaluation broadly as 
the collection and use of information to make 
decisions about an educational program. 11 

Thus, the use of the term evaluation in this 
new sense seems more than justified on the 
grounds that the arena for evaluation continues 
to enjoy unequivocal demarcation; that the 
term is maintained in its broadest meaning; 
that the root of the term, value, receives maxi- 
mum prominence; and that the proposed usage 
already has some currency. 



DECISION SETTINGS AND EVALUATION 
STRATEGIES 

Given this new definition of evaluation with its 
emphasis on decision-making, it is necessary to 
describe educational decision-making to pro- 
vide a basis for conceptualizing a relevant 
methodology for evaluation, The first step will 
be to define different decision-making settings 
which, the authors believe, require different 
evaluation strategies, in this connection, for- 
mulations are based heavily on the work of 
Eiraybrooke and Undblom In the area of public 
policy. 12 

Figure 2 summarizes a conceptualization of 
four generally different decision-making set- 
tings in education, differentiated through the 
intersection of two continue: "small" versus 
"large” educational change, and "high" versus 
"low” understanding to support change. The 
utility of these two continua arises directly as 
a consequence of the authors’ definition of 
evaluation, First, it is clear that the rigor and 
extensiveness of an evaluation is likely to be 
determined by the importance of the decision 
which is to be serviced; this importance in turn 
is gaged by the significance of the change to 
be brought about through the execution of the 
decision. Decisions with unimportant consequen 
ces clearly would not demand the expense and 
thoroughness in an evaluation study that would 
be required by decisions which will have ser- 
ious consequences. Second, as an evaluator 
goes about determining what information he 
should obtain and provide, he must have in 
mind the information that is already available 
and the ability of his client to use it in its pre- 
sent form. Evaluations must be more extensive 
when there is little information available (or 
when the client cannot use available infor- 
mation in its present form). 

Given this brief rationale to justify the intro- 
duction of these two continua— amount of 
change and degree of understanding— le‘. us 
look at each in greater detail. They are com- 
bined to produce the four different decision- 
making settings of Figure 2, which require 
four different evaluation strategies to service 
them. 




Understanding m 



v Low 



INCREMENTALISM 
Activity: Developmental 



NEOMOBILISM 
Activity:^ Innovative 



Purpose: Continuous improvement 



0: 7 ■ ; v 






Basis: Expert judgment plus struc- 



tured inquiry 

V ?'.-' •- v a 1 ''- . = • vr,. 



Purpdsel inventihgi teeing, ah^ 
A diffusing solutions to 
:0:;n00:, significant problems 









Basis: Conceptualization and 
heuristic Investigation 









, - 
- - -- v - - 



; r: ■ ; - . \ 



• '}• ■ M ' ’ . ■ f , ' ■'•.• . 

r,- —J-. . -■„■.■-/ i. ■ r . '■ J - : ^f. ;•--- .- - •. T,- - .,— - . • ... -,.-•••• • •• ■■..■».. •! • -••,•• . -V.-i.V =: •••• V.-.-.. . f .- 

... wnSy ■ ■ ' -I ■--• .. . -X •, - •'. •’ • • *■- * ••'■• \ 

- t . . - . - V • / - - ; , - • - ‘ ‘ • , T f ; - • ■ ■ ' 

■• ■• -* ;7,V, 1- . r ■■:. ■ ...’ yT^rl. : -i:- • ' : ; .V.: ” •-. ;v 7-. — -j^V ! 










* li! ' &U .. 









Small versus Large Change 

The authors' basic rule for distinguishing be- 
tween "small” and "large" change Is that the 
more controversial a change the larger and 
more important It is. The reader can see how 
this rule works by focusing on the issues posed 
in several disagreements in our society about 
the efficacy of present programs and rec- 
ommended changes. Some of these come to 
mind immediately, e.g., a federally-controlled 
education system, federal aid to public educa- 
tion, busing of children for equalizing educa- 
tional opportunities, decentralization of large 
city school districts, automation and "de- 
humanizing" of education, learning aided by 
chemotherapy, National Assessment, etc. 

While it would be very difficult to obtain con- 
census positions with regard to any of these 
Issues, it would not be difficult to obtain agree- 
ment that each one is important. Changes in 
these and similar areas could result in major 
restructuring within education, but more 
significantly such changes could potentially 
produce results in the lives of individuals and in 
society at large which are at great variance 
with the results being produced by the present 
educational offerings. Many persons would 
view these changes as potentially so damaging 
that they would counsel— or demand— that such 
changes be introduced cautiously and gradually 
or that they be installed only after sufficient 
tryout information had been obtained to allow 
reliable and valid predictions of the effects of 
the changes. These issues and others like them, 
signify what the authors (mean by "large" 
changes They involve major restructuring of^ ; 
education and potentially can have significant 
impact on variables considered important by 
society, , : v • .... 

At the opposite end of our continuum, 

"small" changes are identified as unimportant 
variables, Examples include employing new j&V 
teachers to fill present vacancies, purchasing a 
new edition of the textbook series currently in 
use, increasing the amount of attention devoted 
to fractions in the current mathematics pro- 
gram, adding wrestling to the present athletic 
program, replacing all the blackboards with 
green ones, replacing tho present achievement 
testing series with a new one, adding a public 
relations official tO: the school staff, decreasing : , 
the average class size, blacktopping the school 
playground, etc,Changes such as these would 



stimulate relatively little disagreement. Yet, tax- 
payers and educators alike would agree that 
decisions to effect even small changes such as 
these should not be made blindly, They should 
be supported by information which will in- 
crease the likelihood of choosing the most effi- 
cient and effective alternatives, whether the 
choice concerns the hiring of a teacher or the 
selection of textbooks. Small changes, then, are 
changes within education which potentially will 
have no significant impact on variables con- 
sidered important by society, Small changes 
also are characterized by being serial In nature; 
they result in small, stepwise shifts rather than 
large ones. 

High versus Low Understanding 

The second major continuum proposed by 
Braybrooke and Lindblom is; 

. . . the degree to which the decision-makers 
can be supposed to understand all the features 
of the problem with which they are faced. 

Near one extreme, information is generally 
lacking; values (goals, objectives, constraints, 
side conditions) are neither well understood 
nor well reconciled, and intellectual capacity 
generally falls far short of grasping and 
thinking through the problem. Near the other 
extreme, all aspects of the problem are quite 
well grasped in the decision-maker's mind . 13 

It Is important to note from this quotation 
that high understanding is composed of two 
elements: relevant information and the deci- 
sion-maker’s intellectual capacity to Utilize 
that information in the solution of practical 
problems. Both elements are important in de- 
cision-making and the evaluator must be 
equipped to cope with deficiencies in either 
one. . ‘ . ” 

The evaluator's role as obtainer is particu- 
larly influenced by the information element. 

Two requi remen ts must be met if information 
is to be adequate: a validated theoretical struc- 
ture and adequate, practical data about the 
particular decision problem must exist. To the 
extent that either or both of these requirements 
is not met, the evaluator must strive to obtain 
additional information. ; 

As provider of information, the evaluator 
must be particularly concerned with the ability 
of the decision-maker to understand both the 



theoretical and practical information which Is 
available to him. Even if there is adequate infor- 
mation available, it can have little positive in- 
fluence on decision-making If it is in a form that 
the decision-maker cannot understand. There- 
fore, the evaluator has a very critical role in 
fitting together, assessing, and translating 
available Information which has potential rele- 
vance to the needs of the decision-maker. 

It is thus clear that the design of evaluation 
should be grounded In knowledge about the 
amount and importance of change to be 
effected and the amount and quality of under- 
standing which is available to support decision- 
making to effect the change. Only in this way 
can the evaluator be confident that his study 
will be useful. 

The Decision-Making Settings 

Figure 2 is suggested as an aid in understanding 
the general classes of decision-making settings 
within which evaluation studies must be con- 
ducted. The “small versus large change" and 
the “high versus low understanding" continue 
have been combined to yield four classes of 
educational decision-making settings: decisions 
to effect large changes supported by a high 
level of relevant understanding (the upper right 
cell of Figure 2: Metamorphism); decisions to 
effect small changes supported by a high level 
of relevant understanding (the upper left cell 
of Figure 2; Homeostasis); decisions to effect 
smali changes supported by a low level of rele- 
vant understanding (the lower left cell of Fig- 
ure 2: Incrementalism); and decisions to effect 
large changes supported by a low level of rele- 
vant understanding (the lower right cell of Fig- 
ure 2: Neomobilism), 

Metamorphic decision-making denotes uto- 
pian activity aimed at producing complete 
changes in an educational system. Its guiding 
basis is an overarching theory which is neces- 
sary and sufficient to every detail of the pro- 
posed change, and which is completely under- 
stood in all its ramifications by the decision- 
maker. The decision-maker, moreover, must be 
capable of considering all relevant variables 
and collecting, analyzing, and synthesizing per- 
formance data about these variables as the , 
change is being managed. 

The probability favoring this kind of change 
in any educational Institution is indeed slim* 
Rarely exists the utopian situation in which 



adequate theory and information systems to 
support the application of the theory are pres- 
ent along with decision-makers who can assim- 
ilate and use the theory and the necessary 
information as a rationale to effect revolu- 
tionary changes. To the extent that such con- 
texts might exist, evaluation strategies needed 
to support them could be mainly of the total in- 
formation management system type. The ade- 
quate supply of knowledge which already ex- 
isted would be organized and stored for rapid 
retrieval whenever the decision-maker might 
call for it. 

Obviously, such utopian educational 
decision-making settings are mainly theoreti- 
cal. Therefore, they will not be dealt with fur- 
ther here. However, theoretical identification of 
such a setting serves a function as education- 
ists and especially critics of education are prone 
to act as if such change settings do In fact 
exist That is, many assume that adequate 
theory and information are available for effect- 
ing whatever utopian changes might be desired 
and that decision-makers can obtain, under- 
stand, and use this information appropriately. 

If professional educators did not assume this, 
they certainly would take more pains to collect 
information to support the large changes they 
do attempt. 

Homeostatic decision-making denotes re- 
storative activity aimed at maintaining the nor- 
mal balance in an educational system and 
guided by technical standards and a routine, 
cyclical data collection system. Of the four 
types being considered, settings of this type are 
the most prevalent in education. The major 
function of educational administration and 
supervision is to maintain the norma! balance 
in the program, that is, to control the activity 
and to make adjustments as required to adhere 
to the specifications established for the pro- 
gram. Staff assignments, scheduling of students, 
and establishment of bus routes illustrate this 
typo of decision-making. 

Homeostatic decision-making settings re- 
quire evaluation systems characterized by tech- 
nical standards and quality control data collec- 
tion systems. The most prevalent forms of rou- 
tine data collected for homeostatic decisions in- 
clude achievement data, attendance records, 
pupil-personnel data, staff records, and com- 
munity census data. Most schools have ade- 
quate quality control evaluation systems to ser- 
vice their homeostatic decision needs/ Further, 



the changes effected by these decisions are 
small and remedial. Ail in all, no major break- 
throughs in evaluation theory are needed to 
service such minor adjustments which are al- 
ready based on adequate supplies of informa- 
tion, Therefore this setting will not be con- 
sidered in further detail. 

Incremental decision-making denotes de- 
velopmental activity having as its purpose con- 
tinuous improvement in a program. Such activity 
usually Is supported by expert judgment and 
structured inquiry into the efficacy of the pres- 
ent program and the recommended changes, 
Decision-making in this quadrant differs from 
homeostatic decision-making In two respects. 
First, Incremental decisions are Intended to shift 
the program to a new normal balance based 
upon small, serial improvements, while homeo- 
static decisions are intended to correct the pro- 
gram and change it back to its normal balance. 
Second, whilo homeostatic decisions are sup- 
ported by technical standards and a continuing 
supply of routinely collected information, 
evaluations for incremental change are usually 
ad hoc and supported by little extant know- 
ledge. Special studies, the employment of ex- 
pert consultants, and the formation of special 
committees characterize most efforts to intro- 
duce incremental change. 

Incremental decision-making is very preva- 
lent in education. Many so-called educational 
innovations are of the incremental type. They 
are attempts to make improvements in the 
present program without risking a major fail- 
ure. Though there is little information to sup- 
port such changes, the adjustments are suffi- 
ciently small that corrective adjustments can be 
made as problems are detected. As might be 
expected, such changes are based on trial and 
error and are iterative and serial in nature. 

Also, such changes often require allocations of 
special resources. Title I of the Elementary and 
Secondary Education Act fosters much incre- 
mental change. "Congruence evaluation" sys- 
tems are needed to support incremental change. 
Basically, such evaluation programs focus on 
the congruence between intended and actual 
increments of program change, 

Neomoblllstlc decision-making denotes 
innovative activity for inventing, testing, and 
diffusing new solutions to significant prob- 
lems. Such change is supported by little theory 
or extant knowledge: yet, the change is large, 
often because of great opportunities such as 



those being produced by the knowledge explo- 
sion, or because of critical conditions such as 
riots in inner cities. Evaluation systems to sup- 
port neomobilistic decision- making usually are 
ad hoc, non-rigorous types of Investigations. 
Often, these studies are exploratory and heur- 
istic in nature. 

Neomobilistic decision-making is becoming 
more prevalent in education. Critics of educa- 
tion who advocate higher rates of change, the 
explosive conditions in our cities, and the 
knowledge explosion, are ail aspects which 
have served to motivate this kind of change. 

Title III projects and educational policy re- 
search centers engaged in long-range educa- 
tional planning are illustrations of expenditures 
of risk capita! to stimulate educators to create 
and to try out now ideas. To support this kind 
of change, "contingency evaluation" systems 
are needed. Such systems should be heuristic. 
They should explore opportunities and possi- 
bilities. And, they should stimulate inventions 
of new solutions to critical educational prob- 
lems. 

TYPES OF DECISIONS 

Knowledge of the four decision-making set- 
tings is a necessary but not sufficient condition 
for formulating an evaluation model capable of 
serving decision-making. For within each de- 
cision-making setting one could identify literal- 
ly thousands of specific educational decisions, 
all of which might differ from each other in 
certain respects. Unless ways can be found to 
group these individual decisions, it will be nec- 
essary to contrive a different design for every 
conceivable decision. Then the notion of gen- 
eral izable evaluation designs would be meaning- 
less, and the development of evaluation de- 
signs would always be ad hoc. Thus, the need 
is to devise a typology or taxonomy ot decisions 
whose categories are exhaustive of all possible 
educational decisions while also being mutually 
exclusive. Under these circumstances, gener- 
alizable evaluation designs to fit ail decision 
types within similar categories become feasi- 
ble. 

Figure 3 presents the conceptual base from 
which the typology proposed is generated. The 
authors postulate first that decisions should be 
classified as a function of whether they pertain 
to ends or means; this fact is depleted by the 
row headings of Figure 3. The column headings 











Intended Actual 




portray the second dimension which enters into 
the typology: relevance of the decision to inten- 
tions or actualities. Thus, all educational deci- 
sions may be exhaustively and unambiguously 
classified as pertaining to Intended ends— goals, 
Intended means— procady ral designs, actual 
means— procedures in use, or actual ends— at- 
tainments As will bs noted, this schema allows 
the identification of four types of educational 
decisions which are respectively serviced by 
four special types of evaluation: Planning dacl* 
cislons to determine objectives; structuring de- 
cisions to design proceduresjlmplamentlng de- 
signs to utilize, control, and refine procedures; 
and recycling decisions to judge and react to 
attainments. Each type of decision is consi- 
dered at length. 

Planning Decisions 

Planning decisions specify major changes that 
are needed In a program. The need for plan- 
ning decisions arises from awareness of a lack of 
agreement between what the program was in- 
tended to be and what it actually is, or aware- 
ness of a lack of agreement between what the 
program could become and what it is likely to 
become, in either case, decisions could be 
made to change or not to change either inten- 
tions or actualities, pertaining either to means 
or ends. Any such decision to introduci change 
would result in the establishment of program 
objectives, in this paper objectives which pertain 
to changes in either the intended or actual 
means will be referred to as Instrumental ob- 
jectives and objectives which pertain to 
changes in either the intended or actual ends 
will be referred to as consequential objectives. 
(Behavioral objectives are one example of the 
possible types of consequential objectives.) 

. -Planning decisions are illustrated by the fol- 
lowing questions: Should program goals be 
changed? Should the present mission be sus- 
tained or changed? What are the top priority 
needs that the program should serve? What are 
the characteristics of the problems which must 
be solved in meeting the top priority needs 
served by the program? What behaviors should 
the students exhibit following their participation 
in the program? 

As may be inferred from theso examples, the 
authority for planning decisions usually, but 
not always, resides with policy groups. Thus, 
the authors come close to equating planning 



decisions with policy decisions. Role func- 
tionaries who make such decisions include 
boards of education, school superintendents, 
state superintendenu; of public instruction, 
department chairmen, boards of regents, deans, 
college presidents, regional educational labora- 
tory directors, directors of research and de- 
velopment centers, the commissioner of edu- 
cation, and the like. Obvic usly, teachers also 
make planning decisions with regard to be- 
havioral outcomes. 

The formulation of planning decisions has 
consequences that are both Internal and exter- 
nal to the program of Interest Consequences 
that are internal to the program would usually 
take the form o* directives sent from policy 
figures to subordinates. Such directives would 
give notice of new objectives and likely would 
specify modifications in program functions in 
order that the objectives be achieved. Conse- 
quences that are external to the program of in- 
terest would usually be in the form of proposals 
to funding agencies or other external groups 
which might have the capacity to aid or con- 
strain the program. Such proposals would 
likely seek funds, sanction, and/or endorse- 
ment. Clearly, planning decisions are of funda- 
mental importance to any program, and appro- 
priate evaluation mechanisms should be main- 
tained to provide information for the formula- 
tion of new objectives or the modification of 
existing ones, 

Structuring Decisions 

Structuring decision* specify the means to 
achieve the ends which have been established 
as a result of planning decisions. Specification 
of means must consider variables such as 
method, content, organization, personnel, 
schedule, facilities, and budget. Decisions 
about such variables arise from three sources: 
awareness of planning decisions which specify 
what the program is to achieve, awareness that 
there are alternative means available to 
achieve the specified outcomes, and awareness 
of the relative strengths and weaknesses of the 
available procedural alternatives. Given these 
three conditions, an action plan to achieve the 
desired objectives can be structured. 

If Is noteworthy that structuring decisions 
can result in the modification of the established 
objectives. For while objectives are initially 
based on needs, problems, or opportunities, 



they may be and frequently are modified be- 
cause of realistic limitations on available 
means to insure their achievement. For exam- 
ple, the aim is to eliminate the possibility of 
further assassinations of American presidents. 
While most citizens' likely would support this 
objective in the abstract, many of them clearly 
would reject the notion of achieving the goal 
at all costs. The National Rifle Association 
members and weekend hunters certainly will 
not succumb easily to a law which would pro- 
hibit them from owning fire arms, and certainly 
no President will submit to solitary confine- 
ment. Yet these or similar means are theoreti- 
cally available to achieve this objective. Instead 
of supporting these radical means, gun enthusi- 
asts might readily support a law which would re- 
strict known criminals from owning weapons, 
and future Presidents might yield to a regula- 
tion that they participate in parades only in 
bullet-proof, enclosed vehicles. While such 
means would not eliminate the threat of assas- 
sination, they would materially reduce the 
probability of such an occurrence. As seen in 
this example, the relative acceptability of avail- 
able alternative means and their associated 
possible outcomes can serve effectively to 

modify specified outcomes. Ends determine 
means, and vice versa. 

An action plan based upon structuring deci- 
sions is a comprehensive statement of out- 
comes to be achieved, work to be performed, 

and resources and time to be used, The speci- 
fied outcomes are those given by the planning 
decisions, possibly as modified by structuring 
decisions in the selection of means. The deci- 
sions pertaining to work, resources; and time 
take the form of PERT networks, job descrip- 
tions, line-staff organizational plans, proce- 
dural specifications, process end product evalu- 
ation designs, and program budgets, Collective- 
ly, such decisions provide the operating guide- 
lines needed to respond effectively to planning 
or policy decisions. " 

Most structuring decisions are made not by 
policy makers but rather by operations mana- 
gers. Such managers include project directors, 
school principals, activity supervisors, and area 
coordinators. The usual function of policy ad- 
ministrators in structuring decision processes is 
to confirm that the structuring decisions of 
their subordinates are consistent with the es- 
tablished policy structure. 

9° ns ^ u ® r ? bes °f%ucturing decisions gen- 



erally include actions to operationalize a pro- 
cedure. Budget is allocated, Staff are recruited 
and oriented to the intents of the activity, 
Needed materials and facilities are obtained 
and prepared. Management and clerical proce- 
dures are developed, and responsibilities are 
assigned. Finally, as any operations manager 
will readily attest, a major time consuming 
consequence of structuring decisions are seem- 
ingly endless meetings and presentations de- 
signed to orient staff and create interest and 
goodwill among the activity's various publics. 

Implementing Decisions 

Implementing decisions are those involved in 
carrying through the action plan. These deci- 
sions arise from two sources: knowledge of the 
procedural specifications, and continuing know- 
ledge of the relationship between procedural 
specifications and the actual procedures. These 
two kinds of information aid in process control. 

Implementing decisions Involve many 
choices regarding changes of ongoing proce- 
dures. Questions illustrating this type of de- 
cision Include: Should the staff be retrained? 
Should new procedures be instituted? Should 
additional resources be sought? Should respon- 
sibilities bo reassigned to staff? Should the 
schedule be modified? Should the public rela- 
tions activities be changed? Obviously, the 
making and execution of implementing deci- 
sions comprise much of the day-to-day respon- 
sibilities of operating any program. 

Authority for the implementing decisions is 

vested largely in operations managers and 
their designated representatives. Largely, these 
are the same role functionaries that were identi- 
fied above for the structuring decisions. Addi- 
tionally, those responsible to the operations 
manager such as teachers and counselors hold 
delegated powers as a part of their roles to 
make certain implementing decisions. 

Implementing decisions also have varied 
consequences, A role functionary performs his 
work differently, Inservlee training sessions are 
conducted. Staff obtain a better understanding 
of their Individual and collective roles. Special- 
ists external to the program are consulted. 
Newspapers publicize certain aspects of the 

program. New personnel are added to the staff. 
Personnel work overtime. The PERT schedule 
is updated. New materials are obtained. Facili- 
ties are adapted to emergent program needs. 








Although many of these consequences seem 
routine, it is clear that their cumulative effect 
can largely determine the success of a program. 
Therefore, operations managers need daily 
access to information which can shape their 
implementing decisions. 

Recycling Decisions 

Recycling decisions are the fourth and final 
type of decisions in our classification schema of 
educational decisions. These decisions are 
those used In determining the relation of at- 
tainments to objectives and in determining 
whether to continue, terminate, evolve, or dras- 
tically modify an activity. The essential type 
of awareness precipitating these decisions is 
knowledge of the nature and timing of speci- 
fied attainments. 

Basically, recycling decisions involve pro- 
duct control choices. Such decisions are usually 
thought to occur after a complete cycle of an 
activity. But this is a limited view of recycling 
decisions. More appropriately conceived, they 
occur throughout an activity as quality or pro- 
duct control devices. Therefore it is empha- 
sized that recycling decisions are concerned 
with attainments at any point in a program as 
opposed solely to outcomes following a full 
cycle of a program. Whereas implementing de- 
cisions focus oh the extent to which means are 
operant as intended, recycling decisions focus 
on the extent to which desired ends are being 
and/or have been attained. 

Many questions illustrative of what is meant 
by recycling decisions can be posed. Are the 
students’ needs being met? Are problems 
being solved as intended? Is the program fail- 
ing? Was the outcome worth the investment? 
Has there been a significant gain in pupil 
achievement? Has the program benefited by 
using the opportunity that was presented? Has 
sufficient progress been achieved to warrant 
continuation of the program? Is the now pro- 
gram succeeding? Were the results from Pro- 
gram A better than those from Program B? 

Was the procedure effective? Has the program 
resulted in improved teacher competence? 

- Have school-community relations been im- 
proved? Have students improved their self- 
concepts? Questions such as these often must 
bo answered when operations managers are 
attempting to justify new funding requests. 




Continuing to fund expensive procedures with- 
out answering such questions understandably 
is often frowned on by responsible fiscal 
agents. 

Authority for recycling decisions usually re- 
sides with the operations manager during the 
implementation of an activity cycle and with 
the responsible fiscal agent at the conclusion of 
an activity cycle. While the operations manager 
can make certain decisions about outcomes 
which might have policy or fiscal implications, 
he usually has very limited authority to make 
recycling decisions which would result in major 
policy or fiscal changes. Therefore, the policy 
maker is a key figure to be involved in recy- 
cling decisions. 

Recycling decisions have very tangible 
consequences. Program activities may be con- 
tinued at the same level of funding under the 
same product specifications; they may be dras- 
tically changed; or they may be discontinued. 
New funding proposals often are written as a 
result of recycling decisions. Present staff may 
be reassigned or discharged. Attempts may be 
initiated to diffuse or install the tested product 
into a broader context. The activity cycle that 
produced the product may be debugged and 
recycled. 

TYPES OP EVALUATION 

Corresponding to each of these four decision 
types are four types of evaluation, which might 
be thought of as four generallzable evaluation 
designs; the four types are given the names 
context, Input, process, and product. It might 
be noted that the initial letters of these four 
terms form the acronym CIPP (pronounced sip) 
which Is often used as a general name for the 
formulations which are propounded here. 
Context evaluation services planning decisions, 
input evaluation services structuring decisions, 
process evaluation services Implementing de- 
cisions, product evaluation services recycling 
decisions. 

Context Evaluation 

The major objective of context evaluation is to 
define the environment in which change is to 
• occur, tc dopict unmet needs, and to identify 
the prol ims that result in needs not being 
met. For example, the environment might bo 
defi ned as the inner city elementary schools of 




a certain metropolitan area, A context evalua- 
tion might reveal that children were not learn- 
ing to mad at the level expected of them, and 
it might further indicate that a particular prob- 
lem, or problems, was the cause of this failure, 
e.g., instruction might be Inadequate, materials 
might not be appropriate, a language barrier 
might exist, there might be a high rate of ab- 
senteeism, and the like. Thus the children’s 
need to learn to read was being thwarted by 
certain particular problems. Environment, 
needs, and problems would all be involved in 
the context evaluation. 

The method of context evaluation begins 
with a conceptual analysis to identify and de- 
fine the limits of the domain to be served as 
well as its major subparts. Next, empirical 
analyses are performed, using techniques such 
as sample survey, demography, and standard- 
ized tenting. The purpose of this part of context 
evaluation is to identify the discrepancies 
among intended and actual situations for each 
of the isubparts of the domain of interest and 
thereby to identify needs. Finally, context • 
evaluation involves both empirical and concep- 
tual analyses, as well as appeal to theory and 
authoritative opinion, to aid judgements re- 
garding the basic problems underlying each 
need. ' •-• ■■'v.- - 



Input Evaluation 

The major objective of input evaluation is to 
determine how to utilize resources to meet 
program goals. This objective is accomplished 
by identifying and assessing relevant capabili- 
ties of the proposing agency, strategies which 
may be appropriate for meeting program goals, 
and designs which may be appropriate to uti- 
lize a selected strategy. The end product of in- 
put evaluation is an analysis of alternative pro- 
cedural designs in terms of potential costs and a ; 
benefits. Specifically, aK are 

assessed in terms of their staffing time, and 
budget requirements; their potential procedural ... 
barriers; the consequences of not overcoming 
theiie barriers; the possibilities andc©^^ 
overcoming them; relevance of the designs to 
pregram objectives; and overall potential of the 
design to meet program goals. Essentially, in- 
put evaluation provides information for de- 
ciding whether outside assistance should be 
sought for meeting goals and objectives; what 
stiategy should be employed, e g.; the adoption 



of available solutions or the development of 
new ones; and what design or procedural plan 
should be employed for implementing the se- 
lected strategy. 

Methods for input evaluation are lacking in 
education. The prevalent practices include 
committee deliberations, appeal to the profes- 
sional literature, and the employment of con- 
sultants, In a few areas, formal instruments 
exist to aid decision-makers in making input 
decisions, in the design of testing programs, 
for example, one may obtain substantial help 
by referring to the Buros* Mental Measure- 
ments Yearbooks.™ Or in educational research, 
researchers who want to select an experimental 
design can receive material assistance in iden- 
tifying and assessing alternative experimental 
designs by referring to the Gampbell-Stanley 
chapter on experimental design in Gage’s 
Handbook on Research In Teaching . t§ In this 
chapter, the decision situation posed the re- 
searcher in need of an experimental design is 
neatly laid out in the form of alternatives 
which are relevant to experimental research. 
Each of these designs is rated on the basis of 
its potential to meet criteria of internal and ex- 
ternal validity as identified for each of the 
listed designs. 

Decisions based upon input evaluation 
usually result in the specification of proce- 
dures, materials, facilities, schedule, staff re- 
quirements, and budgets in proposals to fund- 
ing agencies. From the information provided in 
the proposals, the funding agencies in turn do 
an input evaluation to determine whether or 
not to fund the proposed projects. Funding 
agencies commonly employ expert consultants 
to serve as judges in their input evaluations. 

Process Evaluation 

v Once a designed course of action has been ap- 
proved and implementation of the design has 

■¥ begun, process evaluation is needed to provide 
periodic feedback to project managers and 
others responsible for continuous control and 
refinement of plans and procedures. The ob- 
jective of process evaluation is to detect or pre- 
dict, during the implementation stages, defects 
^ in the procedural design or its implementation. 
The overall strategy Is to identify and monitor, 
on a continuous basis, the potential! sources of, 
failure in a project. These include, but are not 
limited to, interpersonal relationships among 



staff and students; communication channels; taxonomy of instrumental objectives and aiso- 

logistics; understandings of and agreement elated criteria which are related to educational 

with the intent of the program by persons in- change. 17 Consequential criteria are primarily 
waived in and affected by it; and adequacy of those pertaining to behavioral objectives. By 

the resources, physical facilities, staff, and time way of example, Bloom’s Taxonomy of Educa- 
schedule. tional Objectives'* is useful in the identifica- 

Process evaluation does not require control tion of consequential objectives, 
over assignment of subjects to treatments, nor In the change process, product evaluation 

that the treatments be held constant its pur* provides Information for deciding to continue, 

pose is to assist project personnel to make their terminate, modify, or refocus a change activity, 

decisions a bit more rational in their continual end for linking the activity to other phases of 

efforts to improve the quality of the program. the change process. For example, a product 

Thus, under process evaluation, the evaluator evaluation of a program to develop after school 

accepts the program as it is and as it evolves study for students from disadvantaged homes 

and monitors the total situation by focusing the might show that: the development objectives 
most sensitive and non-intervening data collec- have been satisfactorily achieved and that the 
tion devices and techniques that he can obtain developed innovation is ready to be diffused to 

on the most cruciai aspects of the project Such other schools which need such an innovation, 

evaluation Is multivariate, and not all of the 

important variables can be specified before a THE STRUCTURE 0 F EVALUATION DESIGN 

project is initiated. The process evaluator fo- 
cuses his attention on theoretically important Once an evaluator has selected the type of 
variates, but he also remains alert to any un- evaluation appropriate to the kind of decision 

anticipated but significant events. Under pro- be intends to service, he must then develop a 

cess evaluation, information ill collected, organ- design to implement the evaluation. This is a 
ized systematically, analyzed periodically, and difficult task, as the authors have already as- 

reported as often as project personnel require serted, because few genera! ized designs exist 

such information, daily If necessary. which are adequate to meet evaluation needs. 

Thus, project decision-makers are not only These designs must therefore be generated de 

provided with information needed for antici- novo, it should be noted, however, that it is 

pating and overcoming procedural difficulties, possible to develop these designs for all kinds 
but also with a record of process information of evaluation, i.e., whether context, input, pro- 
to be used later for interpreting project out- cess, or product evaluations, by going through 

comes. : aeries' of identical steps, .The checklist shown 

as Figure 4 is offered for this purpose. The 
structure proposed here has six major parts: 

Product Evaluation focusing the evaluation, Information collection, 

The objective of product evaluation is to mea- information organization, information analysis, 
sure and interpret attainments, not only at the information reporting, and administration, 
end of a project cycle but as often as necessary Each of these parts will be considered separ- 
during the project term. ately. 

The method is to define operationally and to 
measure criteria associated with the objectives Focusing the Evaluation 
of the activity, to compare these measurements B . 

with predetermined absolute or relative stand- The first element in designing an evaluation is 
ards, and to make rational interpretations of v ; : that of focusing. The purpose of this step is to 
the outcomes using the recorded context, Input, s P e ^ out the for * he ©valuation and to de- 
and process information. Criteria for product , fin,® ^whi^ the valuation must 

eval uation may be either Instrumental or son- *!? c l uc *® s 

sequential, a distinction pointed out earlier by three steps. The first step is to identify and 
Scriveh. 16 Instrumental criteria are related to def i riethe didslon situations to be served, 

progrtm the ; ^ v ©n our present low state of knowledge 

achievement of behavioral objectives. So for : \ . a fi oUt M ^ s 

example, Clark and Guba have developed a V ®*V difficult task. However, it Is a crucial §tep 



Focusing the Evaluation* 

1* Define the decision situation(s) to be served, and describe each one in terms of £ts locus, 
criteria, decision rules, timing, and decision alternatives. 

2. Define the system to be evaluated. 

3. Define the evaluation specifications. 

Collection of Information 

1 . Specify each item of information that is to be collected. 

2. Specify the populations, sources, and sampling procedures for information collection. 

3. Specify the instruments and methods for information collection. 

4. Specify the arrangements and schedule for infcrmation collection, 

Organization of Information 

1. Specify a format for organizing the information. 

?■ Specify a means for coding, organizing, storing, and retrieving the information. 

Analysis of Information » ' 

1. Specify the procedures for analyzing the information. 

2, Sp'icify a means for performing the analysis of information, 

Reporting of information 

1. Specify the audiences for the evaluation reports, 

2. Specify formats for the evaluation reports and reporting sessions, 

3. Specify a means for providing the information to the audiences. 

4. Specify a schedule for reporting the Information to the specified audiences, 

Administration of thi Evaluation ;C; i * ? : • 

1 . Summarize the evaluation schedule. 

2. Define staff and resource requirements and plans for meeting these requirements. 

3. Specify meansfpr meeting policy requirements for conduct of the evaluation, 

4. Appraise the potential of the evaluation design for providing Information which is valid, rell 
able, credible/ timely, and pervasive. . 

Specify and schedule means for periodic updating of the evaluation design. 

6. Provide a budget for the total evaluation program. 

The logical structure of evaluation design is the same for all types of evaluation, whether 
text, input, process, or product 




: ■ 






f. 



: . • - . •- -• - - .v- ... * • • 









V/; V- v • ' ; ; • _ 

V - •- : '■ - — > • W * v. -i- ■ ■■O''* \ - • *. . i 1 ’■ ‘ 

v ? : ‘:ll; •».!*£ ; *'“* ; . •• ' , •. ■ ' '*•••..• .• .. ' - V-: •/'. • : 

■ 



Figure 4: Developing Evaluation Design# 









“ : ::A 






? - l . Vv 7 : ■ t '? ‘ . .V : : ■ '.'J.i-. ‘ ii ^ : 



y . . „ ....... 

■r. 7/:.. s’’..; -'r ! 7 '.,7.,' • ' 7 --,7,1.. UA/ •• V.:--. •? • ' T :■ 7 : 



and should be carried out with as much rigor 
as practicable. Decision situations should be 
identified in terms of questions to be answered. 
Then one should identify those responsible for 
making the decisions, e.g., teachers, principals, 
board of education members, state legislators, 
and the like. Next, the criteria and decision 
rules to be employed should be identified. Then 
the timing of the decision situation to be 
served should be estimated so that the evalu- 
ation can be geared to provide relevant data 
prior to the time when the decision must be 
made. Finally, an attempt should be made to 
explicate each important decision situation in 
terms of the alternatives which might reason- 
ably be considered. 

Once the-.decision situations to be served 
have been explicated, the next step in the fo- 
cusing activity is to define the setting within 
which the evaluation is to be conducted, Speci- 
fically, one should define the system In terms 
of its boundaries, its elements, and the charac- 
teristics of the elements. To return to an earlier 
example, the boundaries in a particular situa- 
tion may be the inner city schools of a certain 
metropolitan area; the elements may be de- 
fined as, say, the pupils, the teachers, the par- 
ents and other patrons of the school, the pro- 
gram, the facilities, and similar elements; while 
the characteristics of a particular element, say 
the pupils, might be defined to include age, 
grade placement, intelligence score, sibling or- 
der, native language, and the like. 

The third step in focusing the evaluation is 
to define policies within which the evaluation 
must operate. For example, on© should deter- 
mine whether a "self evaluation” or "outside 
evaluation" is needed. Also, it is necessary to 
determine who will receive evaluation reports 
and who will have access to them. Finally, it is 
necessary to define the limits of access to data 
for the evaluation team. 

Collection of Information 

The second major part of the structure of 
©valuation design Is that of planning the collec- 
tion of information. This section must obviously 
be keyed very closely to the criteria which were 
identified in the focusing step. So for example, 
if cost is a criterion factor, one must be sure 
to collect cost information. 

Using those criteria on© should first Identify 
the source© of the Information to be col lected; 



These information sources should be defined in 
two respects: the origins for the Information, 
e g., students, teachers, principals, or parents, 
and the present state of the information, i.e., in 
recorded or non-recorded form. 

Next, one should specify instruments and 
methods for collecting the needed information. 
Examples include achievement tests, interview 
schedules, and searches through the profes- 
sional literature. Michael and Metfessel 19 have 
recently provided a comprehensive list of in- 
struments with potential relevance for data col- 
lection in evaluations which will be helpful in 
this connection. 

For each instrument that is to be adminis- 
tered, one should next specify the sampling 
procedure to be employed. Where possible, one 
should avoid administering too many instru- 
ments to the same person. Thus, sampling 
without replacement across instruments can be 
a useful technique. Also, where total test scores 
are not needed for each student, one might 
profitably use multiple matrix sampling where 
no student attempts more than a sample of the 
items in a test. 

Finally, one should develop a master sched- 
ule for the collection of information. This 
schedule should detail the interrelations be- 
tween samples, instruments, and dates for the 
collection of information. 

Organization of Information 

A frequent disclaimer in evaluation reports is 
that resources were inadequate to allow for 
processing all of the pertinent data. To avoid 
this problem, definite plans should be made 
regarding the third part of evaluation design; 
organization of Information. Organizing the in- 
formation that is to be collected includes pro- 
viding a format for classifying information and 
designating means for coding, organizing, stor- 
ing, and retrieving the information. 



Analysis of Information 

The fourth major part of evaluation design is 
analysis of information . The purpose of th is 
part is to provide for the descriptive or statisti- 
cal analyses of the information which is to be 
reported to decision-makers. This part also in- 
cludes interpretations and recommendations. 
As with the organization of information, It is 
^ important that threvaluation design specify 





rs 

A Change J 



H Homeostatic 







Process 




Product 


Evaluation 




Evaluation 


System-:: 




-System 



■ . / - 



' , v- •• 

• m m . 












• • : '--y'.'xi' '•• '•■ :'••'•••. 



/ Structuring 1 

• :■•■■• • ... "'V ! ■ V* / ■ 



’ ■’/ - ■; : ' ■* ; * 
A \ f =>•• Vs* VV- 



■ -V: Vv-v V .7 ■ : 77 :-;7i; : / 



A - V 



i 




Decisions 

77 



m i 

I 



I ' ' - : ~ f installation! - - - ■ ' ; ' • • : 1 - > 

. . ' I iriDtdiiaiion j - \f \ \ - 

V J / \ 



I • • - - v, • 

: 5_ • - ■ ■ ■•' >' 

• -TV“~ ■“ y~ : • — ' — “ !x" ”■ ~ - ““ 









- — 1 Termination 



!1|£ 

‘ ■ - ■ - ■- - ■ ,' ' ', ' 7 : ' ' - . . - : 

'»-• 31 ■■ ■ ■ - 



means for performing the analyses. The role 
should be assigned specifically to a qualified 
member of the evaluation team or to an agency 
which specializes in doing data analyses. Also, 
it is important that those who will be respon- 
sible for the analysis of information participate 
in designing the analysis procedures. 

Reporting of information 

The fifth part of evaluation design is the re- 
porting of information. The purpose of this 
part of a design Is to insure that decision-mak- 
ers will have timely access to the information 
they need and that they will receive it in a 
manner and form which facilitates their use of 
the information. In accordance with the policy 
for the evaluation, audiences for evaluation re- 
ports should be identified and defined. Then 
means should be defined for providing infor- 
mation to each audience. Subsequently, the 
format for evaluation reports and reporting 
sessions should be specified. Finally, a master 
schedule of evaluation reporting should be pro- 
vided. This schedule should define the interre- 
lations between audiences, reports, and dates 
for reporting information. - 

Administration of Evaluation 

The last part of evaluation design is that of 
administration of the evaluation. The purpose 
of this part is to provide an overall plan for v y ^ 

executing the evaluation design. The first step 
is to define the overall evaluation schedule. For 
this purpose it. often would be useful to employ 
a scheduling technique such as Program Evalu- 
ation and Review Technique (PERT), The 
second step is to define staff requirements and 
plans for mooting these requirements. The third 
step is to specify means for meeting policy re- 
quirements for conduct of the evaluation. The 
fourth step is to evaluate the potential of the 
evaluation design; criteria for such an evalu- 
ation is given in a following section. The fifth 
step is to specify and schedule means for peri- 
odic updating of the evaluating design. The: 
sixth and final step Is to provide a budget for 
the evaluation; v; ;' 




A TOTAL EVALUATION SYSTEM 



Reliance upon ad hoc evaluation studies can 
prove to be an ineffective and Inefficient means 
of providing information for decision-making 
within a system. Rather, educational systems 
should have well-functioning evaluation pro- 
grams which provide a dynamic baseline of in- 
formation about the system. Such an evalua- 
tion program should meet the regular, evalu- 
ative information requirements of the system, 
and it should be responsive to emergent needs 
for idiosyncratic data. Figure 5 is presented as 
an overview of the total evaluative program 
being proposed herein which provides for sys- 
tematic context evaluation and ad hoc input, 
process, and product evaluations. 

The outer loop represents a continuous, sys- 



tematic context evaluation mechanism, This 
mechanism provides information to the plan- 
ning body of a system for its use in making 
decisions either to change the system or to con- 
tinue with present procedures in the knowledge 
that it is serving important objectives effec- 
tively and efficiently. If the context evaluation 
indicates that there are no discrepancies be- 
tween the intentions and actualities or between 
possibilities and probabilities, the planning 
body likely would make choices which would 
result in a ,, steady-ai-you-go , ^or "enlightened 

persistance 1 ’ state. 

However, if the context evaluation indicated 
that the program is deficient in some way, a 
rational decision-making body likely would 
decide to bring about changes to remove the 
deficiencies. Such changes as indicated in 
Figure S can be of four types. 

Metomorphlc change would be based upon 
decisions to effect large changes in the program 
supported by a high level of relevant under- 
standing concerning how to bring about such 



Changes. . v: _~_v , ^ { ‘ / . ‘ ' 1 C ’ " / . • 

Homeostatic change would be based upon 
decisions to effect small changes supported ; 
again by a high level of relevant under- 7 - ; 
standing. ; ; ' 

Incremental change -would be based upon 
decisions to effect a small change supported 
by a low level of relevant understanding, and 
Neomahfiistio change would be based upon 
decision^ to effect large change supported by a 
low level of relevant understanding, ; , 7 
Depending upon the type of change which 
results from planning decisions, vastly different 







evaluation measures might be needed. In re- 
sponse to homeostatic or metamorphic change 
where adequate information to support deci- 
sion-making is already available from the re- 
search literature and/or the context evaluation 
mechanism, it would be unwise to mount an 
expensive evaluation study to provide informa- 
tion which is redundant to that which already is 
available to the decision-maker. Therefore, our 
model shows; (1) that decislon-inakers would 
make structuring decisions regarding the 
means necessary to bring about homeostatic or 
metamorphic change without any inter- 
vening evaluation support mechanism, other 
than content evaluation, and (2) that these 
structuring decisions would lead directly to in- 
stallation of change in the program and subse- 
quent adjustment of the context evaluation 
mechanism so that the new feature in the sys- 
tem would routinely be monitored by the syste- 
matic context evaluation. 

if, on the other hand, rieomobilistic or incre- 
mental changes are called for, there is a defi- 
nite need for ad hoc evaluation mechanisms to 
support such change, for both the context 
evaluation mechanism and the research litera- 
ture provide inadequate supplies of informa- 
tion to support these types of changes. 

First, an input evaluation study must be 
done to identify and evaluate strategies and 
procedures which could be used to effect de- 
sired changes. Such input evaluation informa- 
tion should assist decision-makers to make judg- 
ments in designing desired change procedures. 

In turn, the structuring decisions usually lead 
to some kind of a trial or pilot phase, for as 
yet, the desired change is an innovation, and 
has not been adequately tested, it is, therefore, 
not ready for installation in the total system. 

Process and product evaluation are next in- 
cluded to aid in decisions pertaining to the trial 
phase. Process evaluation would provide In- 
formation for implementation decisions needed 
for efficient operation of the trial, including 
the recycling of structuring decisions as neces- 
sary. Product evaluation would go on simul- 
taneously throughout the process of the trial in 
conjunction with process evaluation and would 
support recycling decisions which could lead 
to a reformulation of the change to be brought 
about, a modification either in strategy or pro- 
cedure, termination of tha change effort, or in 
the installation of the innovation in the total 
program. In the case of installation, again, the 



context evaluation mechanism would be ad- 
justed to allow systematic monitoring of the new 
element in the total system. 

CRITERIA FOR JUDGING EVALUATIONS 

How can the evaluator evaluate his own acti- 
vity? The information which the evaluation 
produces is the key. What criteria are appropri- 
ate to it? 

This question can be answered in two parts. 

If evaluation produces information, then that 
Information must meet criteria that are ordi- 
narily required of any good information, i.t., 
scientific criteria. But because It is evaluative 
information, it must also meet certain special 
criteria of practical utility. 

The scientific criteria are these: 

Internal validity The information provided 
by the evaluation must display a reasonable 
correspondence to the phenomena which it 
purports to describe or interpret. It must 
have fidelity, or, in the layman’s sense, it 
must be true. 

External validity The information must be 
generalizable to similar situations beyond the 
one in which it was collected. Particularistic 
data have little utility, if, for example, data 
relating to the effectiveness of an innovation 
could not be interpreted as also being valid 
in classrooms other than the ones in which 
they were collected, little would be gained in 
deciding whether to adopt or not. 

Reliability Here the concern is with the repli- 
cability of the data, if a repetition of the 
evaluation did not produce essentially similar 
findings, we should be concerned that the 
findings were simply random and therefore 
meaningless. 

Objectivity Here concern is with the publicness 
of the data. If data are private in the sense 
that only particular persons would so inter- 
pret them, i.e„ that not ail competent judges 
would agree on them, their true meaning is 
subject to question, 

In addition to these four general criteria 
that could be invoked in relation to any infor- 
mation, certain special criteria of practical uti- 
lity must be met by evaluative information. 

These are: 

Relevance The information must relate to 
tho decisions to be made. 



Significance The information must be 
weighted for its meaning in relation to the 
decision. Not ail relevant information is 
equally weighty. The culling and highlighting 
required is a professional task that justifies 
the inclusion of a reportorial expert on the 
evaluation team. 

Scope The information must relate to all as- 
pects involved in the decision. If there are six 
alternatives to be considered,, information 
that applies to only four lacks scope. 
Credibility The Information must be trusted 
by the decision-maker. 

Timeliness The information must come in 
time to be useful to the decision-maker. The 
evaluator must guard against the scientific 
value that argues against publishing findings 
until every last element is in. Late informa- 
tion is worthless information. It is better in 
the evaluative situation to have reasonably 
good information on time than perfect infor- 
mation too late. 

Pervasiveness The information must get to 
all of the audiences (!,e,, to all of the deci- 
sion-makers) who need it. 

Efficiency It is possible for an evaluation to 
mushroom out of all proportions to its value. 
The imprudent evaluator may produce a 
mountain of information whose collection im- 
poses an intolerable financial drain. Proper 
application of the criteria of relevance, signif- 
icance, and scope should remedy the gross- 
est inefficiencies. But even when the infor- 
mation proposed to be collected meets all of 
these criteria, there are probably still alterna- 
tive ways for collecting It that differ in terms 
of the time, costs, personnel, etc., that are re- 
quired. The criterion of efficiency will guide 
the evaluator to the appropriate alternative. 

An evaluate' who can say, after careful 

examination, that his evaluation design will 
produce information that conforms to all of 
these criteria can be assured that he is doing 



Footnotes 



'Webster's New World Dictionary, College 
Edition, New York: World Publishing Com- 
pany, 1966, p, 749. 

"Leo J. Cronbach. '‘Course Improvement 
Through Evaluation," Teachers College Record, 
64 (May 1963), p. 672. 

“David Braybrooke and Charles E. Lind- 
blom. A Strategy of Decision, New York: The 
Free Press, 1963. This most insightful work is 
based on the authors' wide experience In the 
arena of public policy decision-making. A signi- 
ficant portion of the book deals with a discus- 
sion of the inadequacies of ordinary formula- 
tions of decision-making processes, which treat 
decision-making as rational. Braybrooke and 
Lindblom instead espouse a strategy of de- 
cision-making which they term "disjointed in- 
crementalism," and which is based on the 
lower left quadrant of Figure 2 (q.v,), called 
"Incrementalism" by the present authors. The 
educational situation is sufficiently different 
from that normally encountered in public poli- 
cy arenas to make viable certain decision 
strategies in the upper left and lower right 
quadrants of Figure 2 (homeostasis and neomo- 
bilism), which quadrants Braybrooke and Lind- 
blom believe have little utility In guiding policy 
decision strategies in most eases. It is chiefly in 
this regard that the formulations in this paper 
differ from Braybrooke and Lindblom. The 
present authors acknowledge a great indebted- 
ness to them for the concepts of high vs. low 
change and high vs. low understanding, which 
form the basis for the strategies implied in 
Figure 2 and which are further explicated in 
this text. 



"David L. Clark and Egon G. Guba. "An Ex- 
amination of Potential Chang# Roles in Educa- 
tion." Paper read at the Symposium on Innova- 
tion In Planning School Curricula, Airlie House, 
Virginia, October 1965. 

“Benjamins. Bloom. Taxonomy of Educa- 
tional Objectives: The Classification of Educa- 
tional Goals, Handbook 1: Cognitive Domain, 
New York: Longmans, Green and Company, 
Inc., 1956. 

“Newton S. Metfesse! and William B. 
Michael. "A Paradigm Involving Multiple Cri- 
terion Measures for the Evaluation of the Effec- 
tiveness of School Programs." Educational and 
Psychological Measurement, 1967, 27, 
pp. 931-936. 



'Vbid, p. 66. 

"Oscar K. Buros. Mental. Measurement 
Yearbook, Highland Park, New Jersey: 

Gryphon Press, 1938, 1940, 1949, 1953, 1959, 
and 1965. 

,8 N.L. Gage, Editor. Handbook on Research 
In Teaching, The American Educational Re- 
search Association, Chicago: Rand McNally and 
Company, 1963. 

“Michael Seriven. The Methodology of 
Evaluation, Bloomington, Indiana: Indiana Uni- 
versity, Social Science Education Consortium, 
Publication #110, 1965. 



